UCP/PROTO: Prefer interfaces closer to memory #10932

tvegas1 · 2025-10-03T16:04:51Z

What?

Prefer interfaces that are closer to the memory to send. We go from 15GB/s to 357GB/s in some cases.

Why?

Some systems have HCAs and a GPU under the same PCI bridge. In that case, although all interfaces have same bandwidth, protocol selection should first select those. This is particularly true when using multiple GPUs with pairwise traffic (GPUx to remote GPUx), where traffic should remain on sibling interfaces.

How?

The code fetches the interfaces bandwidth and checks if it needs to further restrict it due to system topology (three values infinite/17GBs/0.2GBs/). In case of slower interfaces (11GBs) this does not work and we cannot discriminate interfaces under same PCI bridge or not.

Proposal is to boost bandwidth in such case but ideally bw/lat/distance and even partitioning of the traffic should be taken into account. We could also have proto selection not rely only on raw bw.

iyastreb · 2025-10-06T08:06:43Z

src/ucp/proto/proto_common.c

+            (device_sys_dev != UCS_SYS_DEVICE_ID_UNKNOWN) &&
+            (sys_dev != device_sys_dev) &&
+            ucs_topo_is_pci_bridge(device_sys_dev, sys_dev)) {
+            tl_perf->bandwidth *= 1.2;


I think it's ok as a quickfix

I agree about taking into account multiple factors when choosing a protocol. This is actually implemented in protocol variants: #10778
The idea is to select lanes by score (not just BW), where both BW and latency contributes. With this approach the better way maybe would be to decrease system latency for iface on the same PCI bridge (or increase if not).

UCP/PROTO: Prefer interfaces closer to memory

e9331c3

iyastreb reviewed Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UCP/PROTO: Prefer interfaces closer to memory #10932

UCP/PROTO: Prefer interfaces closer to memory #10932

Uh oh!

tvegas1 commented Oct 3, 2025 •

edited

Loading

Uh oh!

iyastreb Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

UCP/PROTO: Prefer interfaces closer to memory #10932

Are you sure you want to change the base?

UCP/PROTO: Prefer interfaces closer to memory #10932

Uh oh!

Conversation

tvegas1 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

How?

Uh oh!

iyastreb Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tvegas1 commented Oct 3, 2025 •

edited

Loading